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Abstract. We give the first algorithm that is both query-efficient and time-efficient 
for testing whether an unknown function / : {0, 1}"— >{0, 1} is an s-sparse 
GF{2) polynomial versus e-far from every such polynomial. Our algorithm makes 
poly(s, 1/e) black-box queries to / and runs in time n • poly(s, 1/e). The only 
previous algorithm for this testing problem [DLM^07] used poly(s, 1 /e) queries, 
but had running time exponential in s and super-polynomial in 1/e. 
Our approach significantly extends the "testing by implicit learning" methodol- 
ogy of [DLM+07]. The learning component of that earlier work was a brute- 
force exhaustive search over a concept class to find a hypothesis consistent with 
a sample of random examples. In this work, the learning component is a sophis- 
ticated exact learning algorithm for sparse GF{2) polynomials due to Schapire 
and Sellie [SS96]. A crucial element of this work, which enables us to simu- 
late the membership queries required by [SS96], is an analysis establishing new 
properties of how sparse GF{2) polynomials simplify under certain restrictions 
of "low-influence" sets of variables. 

1 Introduction 

Background and motivation. Given black-box access to an unknown function / : 
{0, 1}"^{0, 1}, a natural question to ask is whether the function has a particular form. 
Is it representable by a small decision tree, or small circuit, or sparse polynomial? In 
the field of computational learning theory, the standard approach to this problem is to 
assume that / belongs to a specific class C of functions of interest, and the goal is to 
identify or approximate /. In contrast, in property testing nothing is assumed about the 
unknown function /, and the goal of the testing algorithm is to output "yes" with high 
probability if / € C and "no" with high probability if / is e-far from every g G C. 
(Here the distance between two functions /, g is measured with respect to the uniform 
distribution on {0, 1}", so / and g are e-far if they disagree on more than an e fraction 
of all inputs.) The complexity of a testing algorithm is measured both in terms of the 
number of black-box queries it makes to / (query complexity) as well as the time it 
takes to process the results of those queries (time complexity). 

There are many connections between learning theory and testing, and a growing 
body of work relating the two fields (see [Ron07] and its references). Testing algorithms 
have been given for a range of different function classes such as linear functions over 
GF{2) (i.e. parities) [BLR93]; degree-dGi^(2) polynomials [AKK+03]; Boolean liter- 
als, conjunctions, and s-term monotone DNF formulas [PRS02]; A: -juntas (i.e. functions 
which depend on at most k variables) [FKR+04]; halfspaces [MORS07]; and more. 

Recently, Diakonikolas et al. [DLM+07] gave a general technique, called "testing 
by impUcit learning," which they used to test a variety of different function classes 
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that were not previously known to be testable. Intuitively, these classes correspond to 
functions with "concise representations," such as s-term DNFs, size-s Boolean formu- 
las, size-s Boolean circuits, and s-sparse polynomials over constant-size finite fields. 
For each of these classes, the testing algorithm of [DLM+07] makes only poly(s, 1/e) 
queries (independent of n). 

The main drawback of the [DLM+07] testing algorithm is its time complexity. For 
each of the classes mentioned above, the algorithm's running time is 2"'^''^ as a func- 
tion of s, and a;(poly(l/e)) as a function of e.' Thus, a natural question asked by 
[DLM+07] is whether any of these classes can be tested with both time complexity and 
query complexity poly (s, 1/e). 

Our result: efficiently testing sparse GF{2) polynomials. In this paper we focus on 
the class of s-sparse polynomials over GF{2). Polynomials over GF{2) (equivalently, 
parities of ANDs of input variables) are a simple and well-studied representation for 
Boolean functions. It is well known that every Boolean function has a unique represen- 
tation as a multilinear polynomial over GF{2), so the sparsity (number of monomials) 
of this polynomial is a very natural measure of the complexity of /. Sparse GF{2) 
polynomials have been studied by many authors from a range of different perspectives 
such as learning [BS90,FS92,SS96,Bsh97a,BM02], approximation and interpolation 
[Kar89,GKS90,RB91], the complexity of (approximate)counting [EK89,KL93,LVW93], 
and property testing [DLM+07]. 

The main result of this paper is a testing algorithm for s-sparse GF{2) polynomials 
that is both time-efficient and query-efficient: 

Theorem 1. There is a poly{s, 1/ e)-query algorithm with the following performance 
guarantee: given parameters s, e and black-box access to any f : {0, 1}"— >{0, 1}, it 
runs in time poly(s, 1/e) and tests whether f is an s-sparse GF{2) polynomial versus 
e-farfrom every s-sparse polynomial. 

This answers the question of [DLM+07] by exhibiting an interesting and natural 
class of functions with "concise representations" that can be tested efficiently, both in 
terms of query complexity and running time. 

We obtain our main result by extending the "testing by implicit learning" approach 
of [DLM+07]. In that work the "implicit learning" step used a naive brute-force search 
for a consistent hypothesis; in this paper we employ a sophisticated proper learning al- 
gorithm due to Schapire and Sellie [SS96]. It is much more difficult to "impUcitly" run 
the [SS96] algorithm than the brute-force search of [DLM+07]. One of the main tech- 
nical contributions of this paper is a new structural theorem about how s-sparse GF{2) 
polynomials are affected by certain carefully chosen restrictions; this is an essential 
ingredient that enables us to use the [SS96] algorithm. We elaborate on this below. 

Techniques. We begin with a brief review of the main ideas of [DLM+07]. The ap- 
proach of [DLM+07] builds on the observation of Goldreich et al. [GGR98] that any 

' We note that the algorithm also has a linear running time dependence on n, the number of 
input variables; this is in some sense inevitable since the algorithm must set n bit values just to 
pose a black-box query to /. Our algorithm has running time linear in n for the same reason. 
For the rest of the paper we discuss the running time only as a function of s and e. 
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proper learning algorithm for a function class C can be used as a testing algorithm for 
C. (Recall that a proper learning algorithm for C is one which outputs a hypothesis h 
that itself belongs to C.) The idea behind this observation is that if the function / be- 
ing tested belongs to C then a proper learning algorithm will succeed in constructing a 
hypothesis that is close to /, while if / is e-far from every 5 G C then any hypothesis 
h E C that the learning algorithm outputs must necessarily be far from /. Thus any 
class C can be tested to accuracy e using essentially the same number of queries that are 
required to properly learn the class to accuracy ^(e). 

The basic approach of [GGR98] did not yield query-efficient testing algorithms 
(with query complexity independent of n) since virtually every interesting class of func- 
tionsoverjO, l}"requires i7(logn) examples forproper learning. However, [DLM+07] 
showed that for many classes of functions defined by a size parameter s, it is possible 
to "implicitly" run a (very naive) proper learning algorithm over a number of variables 
that is independent of n, and thus obtain an overall query complexity independent of n. 
More precisely, they first observed that for many classes C every / G C is "very close" 
to a function f & C for which the number r of relevant variables is polynomial in s 
and independent of n; roughly speaking, the relevant variables for /' are the variables 
that have high influence in /. (For example, if / is an s-sparse GF{2) polynomial, 
an easy argument shows that there is a function /' - obtained by discarding from / 
all monomials of degree more than log(s/T) - that is r-close to / and depends on at 
most r = slog(s/T) variables.) They then showed how, using ideas of Fischer et al. 
[FKR+04] for testing juntas, it is possible to construct a sample of uniform random ex- 
amples over {0, 1}'" which with high probability are all labeled according to /'. At this 
point, the proper learning algorithm employed by [DLM+07] was a naive brute-force 
search. The algorithm tried all possible functions in C over r (as opposed to n) vari- 
ables, to see if any were consistent with the labeled sample. [DLM+07] thus obtained a 
testing algorithm with overall query complexity poly (s/e) but whose running time was 
dominated by the brute-force search. For the class of s-sparse GF{2) polynomials, their 
algorithm used 6 (s^/e^) queries but had running time at least 2"(*) • (l/e)'°siog(i/e). 

Current approach. The high-level idea of the current work is to employ a much more 
sophisticated - and efficient - proper learning algorithm than brute-force search. In par- 
ticular we would like to use a proper learning algorithm which, when applied to learn 
a function over only r variables, runs in time polynomial in r and in the size param- 
eter s. For the class of s-sparse GF{2) polynomials, precisely such an algorithm was 
given by Schapire and Sellie [SS96]. Their algorithm, which we describe in Section 2.1, 
is computationally efficient and generates a hypothesis h which is an s-sparse GF{2) 
polynomial. But this power comes at a price: the algorithm requires access to a member- 
ship query oracle, i.e. a black-box oracle for the function being learned. Thus, in order 
to run the Schapire/Sellie algorithm in the "testing by implicit learning" framework, it is 
necessary to simulate membership queries to an approximating function /' G C which 
is close to / but depends on only r variables. This is significantly more challenging 
than generating uniform random examples labeled according to /', which is all that is 
required in the original [DLM+07] approach. 

To see why membership queries to /' are more difficult to simulate than uniform 
random examples, recall that / and the /' described above (obtained from / by discard- 
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ing high-degree monomials) are r-close. Intuitively this is extremely close, disagreeing 
only on a 1/m fraction of inputs for an m that is much larger than the number of ran- 
dom examples required for learning /' via brute-force search (this number is "small" 
- independent of n - because /' depends on only r variables). Thus in the [DLM+07] 
approach it suffices to use /, the function to which we actually have black-box access, 
rather than /' to label the random examples used for learning /'; since / and /' are 
so close, and the examples are uniformly random, with high probability all the labels 
will also be correct for /'. However, in the membership query scenario of the current 
paper, things are no longer that simple. For any given /' which is close to /, one can no 
longer assume that the learning algorithm's queries to /' are uniformly distributed and 
hence unlikely to hit the error region - indeed, it is possible that the learning algorithm's 
membership queries to /' are clustered on the few inputs where / and /' disagree. 

In order to successfully simulate membership queries, we must somehow consis- 
tently answer queries according to a particular /', even though we only have oracle 
access to /. Moreover this must be done implicitly in a query-efficient way, since explic- 
itly identifying even a single variable relevant to /' requires at least i7(log n) queries. 
This is the main technical challenge in the paper 

We meet this challenge by showing that for any s-sparse polynomial /, an approx- 
imating /' can be obtained as a restriction of / by setting certain carefully chosen 
subsets of variables to zero. Roughly speaking, this restriction is obtained by randomly 
partitioning all of the input variables into r subsets and zeroing out all subsets whose 
variables have small "collective influence" (more precisely, small variation in the sense 
of [FKR+04]). It is important that the restriction sets these variables to zero, rather than 
a random assignment; intuitively this is because setting a variable to zero "kills" all 
monomials that contain the variable, whereas setting it to 1 does not. Our main techni- 
cal theorem (Theorem 3, given in Section 3) shows that this /' is indeed close to / and 
has at most one of its relevant variables in each of the surviving subsets. We moreover 
show that these relevant variables for /' all have high influence in / (the converse is 
not true; examples can be given which show that not every variable that has "high influ- 
ence" in / will in general become a relevant variable for /'). This property is important 
in enabling our simulation of membership queries. In addition to the crucial role that 
Theorem 3 plays in the completeness proof for our test, we feel that the new insights 
the theorem gives into how sparse polynomials "simplify" under (appropriately defined) 
random restrictions may be of independent interest. 

Organization. In Section 4, we present our testing algorithm, Test-Sparse-Poly, along 
with a high-level description and sketch of correctness. In Section 2.1 we describe in 
detail the "learning component" of the algorithm. In Section 3 we state Theorem 3, 
which provides intuition behind the algorithm and serves as the main technical tool in 
the completeness proof. Due to space limitations, the proof of Theorem 3 is presented in 
Appendix A, while the completeness and soundness proofs are given in Appendices B 
and C, respectively (see full version available online). 

2 Preliminaries and Background 

GF(2} Polynomials: A GF{2) polynomial is a parity of monotone conjunctions (mono- 
mials). It is s-sparse if it contains at most s monomials (including the constant-1 mono- 
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mial if it is present). The length of a monomial is the number of distinct variables that 
occur in it; over GF{2), this is simply its degree. 

def 

Notation: For i e W*, denote [i] = {1,2,..., i}. It will be convenient to view the 
output range of a Boolean function / as { — 1, 1} rather than {0, 1}, i.e. / ; {0, 1}" — > 
{ — 1,1}. We view the hypercube as a measure space endowed with the uniform product 
probability measure. For / C [n] we denote by {0, 1}^ the set of all partial assignments 
to the coordinates in /. For w G {0, 1}["1\^ and z G {0, 1}^, we write w U z to denote 
the assignment in {0, 1}" whose i-th coordinate is uj^ if i G [n] \ I and is Zi if i € I. 
Whenever an element z in {0, 1}^ is chosen randomly (we denote z Gr {0, 1}^), it is 
chosen with respect to the uniform measure on {0, 1}^. 

Influence, Variation and the Independence Test: Recall the classical notion of in- 
fluence [KKL88]: The influence of the i-th coordinate on / : {0, 1}" — > { — 1, 1} is 

Infi(/) Pr^g^{o.i}" [f{^) 7^ /(s;®*)], where x®' denotes x with the i-th bit flipped. 
The following generalization of influence, the variation of a subset of the coordinates 
of a Boolean function, plays an important role for us: 

Definition 1 (variation, [FKR+04]). Let f : {0, 1}" {-1, 1}, and let I C [n]. We 
deflne the variation off on I as Vr/(/) = E,„g^{o_i|H\/ [V^g^{o,i}/ [f{w U z)]]. 

When / = {i} we will sometimes write Vry(i) instead of Vry({i}). It is easy to 
check that Vr/(i) = Infi(/), so variation is indeed a generalization of influence. Intu- 
itively, the variation is a measure of the ability of a set of variables to sway a function's 
output. The following two simple properties of the variation will be useful for the anal- 
ysis of our testing algorithm: 

Lemma 1 (monotonicity and sub-additivity, [FKR+04]). Lef / : {0, 1}" {1, 1} 

and A,B C [n]. Then Yi-f{A) < Yi-f{A U B) < Yi-f{A) + Vr/(S). 

Lemma 2 (probabiUty of detection, [FKR+04]). Let f : {0, 1}" {1, 1} and 

I ^ M- if w <Eg {0, 1}["1\^ and zi,Z2 £r {0,1}^ are chosen independently, then 
Pr[fiwUz,)^f{wUz2)]^^YTfiI). 

We now recall the independence test from [FKR+04], a simple two query test used 
to determine whether a function / is independent of a given set / C [n] of coordinates. 

Independence test: Given / : {0, 1}" {-1, 1} and / C [n], choose w Gr {0, 1}["1\^ 
and zi, Z2 Gr {0, 1}^ independently. Accept if f{w U zi) — f{w U Z2) and reject if 

fiwUz^)^ f{wUZ2). 

Lemma 2 implies that the independence test rejects with probability exactly ^ Vr j (/). 

Random Partitions: Throughout the paper we will use the following notion of a ran- 
dom partition of the set [n] of input coordinates: 

Definition 2. A random partition of [n] into r subsets is constructed by inde- 

pendently assigning each i G [n] to a randomly chosen Ij for some j G [r]. 



6 



We now define the notion of low- and high-variation subsets with respect to a partition 
of the set [n] and a parameter a > 0. 

Definition 3. For f : {0, 1}"^{ — 1, 1}, a partition of [n] into {Ij}j^i and a param- 
eter a > 0, define L{a) {j G [r] \ Vr/(/j) < a} (low-variation subsets) and 

H[a) [r] \ L{a) (high-variation subsets). For j G [?'] and i G Ij, if\rj{i) > a we 
say that the variable high- variation element of Ij. 

Finally, the notion of a well- structured subset will be important for us: 

Definition 4. For f : {0, 1}" { — 1,1} and parameters a > A > 0, we say that a 
subset I C [n] of coordinates is (a, Z\)-well structured if there is an i €z I such that 
Vr/(i) > aandYTf{I\{i}) < A. 

Note that since a > Z\, by monotonicity, the j G / in the above definition is unique. 
Hence, a well-structured subset contains a single high-influence coordinate, while the 
remaining coordinates have small total variation. 

2.1 Background on Schapire and Sellie's algorithm. 

In [SS96] Schapire and Sellie gave an algorithm, which we refer to as LearnPoly, for 
exactly learning s-sparse GF{2) polynomials using membership queries (i.e. black- 
box queries) and equivalence queries. Their algorithm is proper, this means that every 
equivalence query the algorithm makes (including the final hypothesis of the algorithm) 
is an s-sparse polynomial. (We shall see that it is indeed crucial for our purposes that 
the algorithm is proper) Recall that in an equivalence query the learning algorithm 
proposes a hypothesis h to the oracle: if h is logically equivalent to the target function 
being learned then the response is "correct" and learning ends successfully, otherwise 
the response is "no" and the learner is given a counterexample x such that h(x) ^ f{x). 
Schapire and Sellie proved the following about their algorithm: 

Theorem 2. [[SS96], Theorem 10] Algorithm LearnPoly is a proper exact learning 
algorithm for the class of s-sparse GF{2) polynomials over {0, 1}". The algorithm 
runs in poly{n, s) time and makes at most poly(n, s) membership queries and at most 
ns + 2 equivalence queries. 

We can easily also characterize the behavior of LearnPoly if it is run on a function 
/ that is not an s-sparse polynomial. In this case, since the algorithm is proper all of its 
equivalence queries have s-sparse polynomials as their hypotheses, and consequently 
no equivalence query will ever be answered "correct." So if the (ns + 2)-th equivalence 
query is not answered "correct," the algorithm may infer that the target function is not 
an s-sparse polynomial, and it returns "not s-sparse." 

A well-known result due to Angluin [Ang88] says that in a Probably Approximately 
Correct or PAC setting (where there is a distribution T> over examples and the goal is to 
construct an e-accurate hypothesis with respect to that distribution), equivalence queries 
can be straightforwardly simulated using random examples. This is done simply by 
drawing a sufficiently large sample of random examples for each equivalence query 
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and evaluting both the hypothesis h and the target function / on each point in the 
sample. This either yields a counterexample (which simulates an equivalence query), 
or if no counterexample is obtained then simple arguments show that for a large enough 
(0(log(l/(5)/e)-size) sample, with probability 1 — S the functions / and h must be 
e-close under the distribution V, which is the success criterion for PAC learning. This 
directly gives the following corollary of Theorem 2: 

Corollary 1. There is a uniform distribution membership query proper learning algo- 
rithm, which we call LearnPoly'(s, n, e, 5), which makes Q{s,n, e, S) = poly (s, n, 1 /e, 
log(l/(5)) membership queries and runs in poly((3) time to learn s-sparse polynomials 
over {0, 1}" to accuracy e and confidence 1 — 5 under the uniform distribution. 

3 On restrictions which simplify sparse polynomials 

This section presents Theorem 3, which gives the intuition behind our testing algorithm, 
and lies at the heart of the completeness proof. We give the full proof of Theorem 3 in 
Appendix A (see the full version). 

Roughly speaking, the theorem says the following: consider any s-sparse GF{2) 
polynomial p. Suppose that its coordinates are randomly partitioned into r = poly(s) 
many subsets The first two statements say that w.h.p. a randomly chosen 

"threshold value" a « 1/ poly(s) will have the property that no single coordinate i, 
i € [n], or subset Ij, j e [r], has Vrp{i) or Vrp(/j) "too close" to a. Moreover, the 
high-variation subsets (w.rt. a) are precisely those that contain a single high variation 
element i (i.e. Vrp(i) > a), and in fact each such subset Ij is well-structured (part 3). 
Also, the number of such high-variation subsets is small (part 4). Finally, let p' be the 
restriction of p obtained by setting all variables in the low-variation subsets to 0. Then, 
p' has a nice structure: it has at most one relevant variable per high-variation subset 
(part 5), and it is close to p (part 6). 

Theorem 3. Let p : {0, — 1, 1} be an s-sparse polynomial. Fix r G (0, 1) and 

A such that A < Aq t/(1600s3 log(8s3/T)) and A = poly(T/s). Letr = ACs/A, 
for a suitably large constant C. Let {IjYj^i be a random partition of [n]. Choose a 

uniformly at randomfrom the set A{t, A) = {j^ + (8^ — 4)Z\ : £ £ [K]} where K is 
the largest integer such that SKA < j^. Then with probability at least 9/10 (over the 
choice of a and {IjYj^i), all of the following statements hold: 

L Every variable Xi, i G [n], has Vrp(i) ^ [a — AA, a + AA\. 

2. Every subset Ij, j G [r], has \Vp{Ij) ^ [a — 3Z\, a + 4Z\]. 

3. For every j G 11(a), Ij is (a, A)-well structured. 

4. \H{a)\ < slog(8sVr). 

def 

Let p' = p|o^Ujei,(Q)7j (the restriction obtained by fixing all variables in low-variation 
subsets to 0). 

5. For every j G 11(a), p' has at most one relevant variable in Ij (hence p' is a 
\II{a)\-junta). 
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Algorithm Test-Sparse-Poly (/, s, e) 

Input: Black-box access to / : {0, 1}"— »{ — 1, 1}; sparsity parameter s > 1; error parameter 

€ > 

Output: "yes" if / is an s-sparse GF{2) polynomial, "no" if / is e-far from every s-sparse 
GF{2) polynomial 

1. Let r = &{e), A = e(poly(r, l/s)),r = 0{s/A), 5 = 6)(poly(r, l/s))." 

2. Set {Ij to be a random partition of [n] . 

3. Choose a uniformly at random from the set yl(T, Zi) = {^ + (8£-4)zi : 1 < £ < K} 
where K is the largest integer such that SKA < 

4. For each subset Ii, . . . ,Ir run the independence test M = ln(200r) times and let 
Vr/(/j) denote 2 x (fraction of the AI runs on Ij that the test rejects). If any subset Ij 
has Vr/(Ij ) G [a — 2 A, a + 3 A] then exit and return "no," otherwise continue. 

5. Let Z(a) C [r] denote {j £ [r] : Vr/(/j) < a ~ 2A < a} and let H{a) denote 
M \L(q). Let 7' : {0,1}"->{~1,1} denote the function /|o<-u. ~, 

j G L (a) J 

6. Draw a sample of m = | In 12 uniform random examples from {0, 1}" and evaluate 
both /' and / on each of these examples. If / and /' disagree on any of the m examples 
then exit and return "no." If they agree on all examples then continue. 

7. Run the learning algorithm LeamPoly'(s, |ff(a)|, e/4, 1/100) from [SS96] using 
SimMQ(/, H{a), {/,},g/^(„), a, A, z, S/Q{s, \H{a)\, e/4, 1/100)) to simulate each 

membership query on a string z G {0, l}l^("'i that LearnPoly' makes. If LearnPoly' 
returns "not s-sparse" then exit and return "no." Otherwise the algorithm terminates 
successfully; in this case return "yes." 

" More precisely, we set r = e/600, Z\ = min{Z\o, (r/8s^) (5/ ln(2/(5)) }, r = 4Cs/A 
(for a suitable constant C from Theorem 3), where Aq r/(l600s^ log(8s^/r)) and 
5 = l/(^100s log(8sVr)Q(s, slog(8sVr), e/4, l/lOO)) 



Fig. 1. The algorithm Test-Sparse-Poly. 

6. The function p' is t -close to p. 

Theorem 3 naturally suggests a testing algorithm, whereby we attempt to parti- 
tion the coordinates of a function / into "high-variation" subsets and "low-variation" 
subsets, then zero-out the variables in low-variation subsets and implicitly learn the re- 
maining function /' on only poly(s, 1/e) many variables. This is exactly the approach 
we take in the next section. 

4 The testing algorithm Test-Sparse-Poly 

In this section we present our main testing algorithm and give high-level sketches of 
the arguments establishing its completeness and soundness. The algorithm, which is 
called Test-Sparse-Poly, takes as input the values s, e > and black-box access to 
/ : {0, 1}"— >{ — 1, 1}. It is presented in full in Figure 1. 
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Algorithm Set-High-Influence-Variable(/, I, a, A, b, 5) 

Input: Black-box access to / : {0, 1}"— >{ — 1, 1}; (a, Z\)-well-structured set / C [n]; bit 
6 G {0, 1}; failure parameter S. 

Output: assignment w £ {0, 1}^ to the variables in I such that Wi = b with probability 

1-5 

1. Draw X uniformly from {0, 1}^. Define /" {j £ I : Xj — 0} and = {j £ I : 
X, = 1}. 

2. Apply c — ^ ln(§) iterations of the independence test to (/, I^). If any of the c itera- 
tions reject, mark J". Do the same for (/, /^). 

3. If both or neither of and are marked, stop and output "fail". 

4. If /'' is marked then return the assignment w = x. Otherwise return the assignment 
w = x (the bitwise negation of x). 



Fig. 2. The subroutine Set-High-Influence- Variable. 



Algorithm SimMQ(/, H, {Ij}-jeH,a, A, z, 5) 

Input: Black-box access to / : {0, — 1, 1}; subset H (Z [r]; disjoint subsets {Ij}jeH 
of [n]; parameters a > A; string z G {0, 1}'^'; failure probability 5 

Output: bit b which, with probability 1 — 5 is the value of /' on a random assignment x in 
which each high-variation variable i G (j G H) is set according to z 

1. For each j G -ff, call Set-High-Influence- Variable(/, /j, a, Zi, Zj, and get back 
an assignment (call it ) to the variables in Ij . 

2. Construct x G {0, 1}" as follows: for each j G H, set the variables in Ij according to 

. This defines Xi for all i G Ujenlj- Set Xi — for all other i G [n]. 

3. Return b — fix). 



Fig. 3. The subroutine SimMQ. 



The first thing Test-Sparse-Poly does (Step 2) is randomly partition the coordinates 
into r = 0{s'^ /t) subsets. In Steps 3 and 4 the algorithm attempts to distinguish subsets 
that contain a high-influence variable from subsets that do not; this is done by using the 
independence test to estimate the variation of each subset (see Lemma 2). 

Once the high-variation and low-variation subsets have been identified, intuitively 
we would like to focus our attention on the high-influence variables. Thus, Step 5 of 
the algorithm defines a function /' which "zeroes out" all of the variables in all low- 
variation subsets. Step 6 of Test-Sparse-Poly checks that / is close to /' 

The final step of Test-Sparse-Poly is to run the algorithm LearnPoly' of [SS96] to 
learn a sparse polynomial, which we call /", which is isomorphic to /' but is defined 
only over the high-influence variables of / (recall that if / is indeed s-sparse, there is 
at most one from each high-variation subset). The overall Test-Sparse-Poly algorithm 
accepts / if and only if LearnPoly' successfully returns a final hypothesis (i.e. does 
not halt and output "fail"). The membership queries that the [SS96] algorithm requires 
are simulated using the SimMQ procedure, which in turn uses a subroutine called Set- 
High-Influence- Variables. 
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The procedure Set-High-Influence-Variable (SHIV) is presented in Figure 2. The 
idea of this procedure is that when it is run on a well-structured subset of variables /, 
it returns an assignment in which the high-variation variable is set to the desired bit 
value. Intuitively, the executions of the independence test in the procedure are used to 
determine whether the high- variation variable z G / is set to or 1 under the assignment 
X. Depending on whether this setting agrees with the desired value, the algorithm either 
returns x or the bitwise negation of x (this is slightly different from Construct-Sample, 
the analogous subroutine in [DLM+07], which is content with a random x and thus 
never needs to negate coordinates). 

Figure 3 gives the SimMQ procedure. When run on a function / and a collection 
{Ij}jeH of disjoint well-structured subsets of variables, SimMQ takes as input a string 
z of length | H \ which specifies a desired setting for each high- variation variable in each 
Ij (j e H). SimMQ constructs a random assignment x G {0, 1}" such that the high- 
variation variable in each Ij (j £ H) is set in the desired way in x, and it returns the 
value f'{x). 

4.1 Time and Query Complexity of Test-Sparse-Poly 

As stated in Figure 1, the Test-Sparse-Poly algorithm runs LearnPoly'(s, |i7(Q;)|, 
e/4, 1/100) using SimMQ(/, {I,} a, A, z, 1/(100Q(5, |i?(a)|, z, 1/100))) 

to simulate each membership query on an input string z G {0, Thus the algo- 
rithm is being run over a domain of |i^(a)| variables. Since we certainly have |i?(a)| < 
< poly(s, 7), Corollary 1 gives that LearnPoly' makes at most poly(s, i) many calls 
to SimMQ. From this point, by inspection of SimMQ, SHIV and Test-Sparse-Poly, 
it is straightforward to verify that Test-Sparse-Poly indeed makes poly(s, i) many 
queries to / and runs in time poly(s, ^) as claimed in Theorem 1. Thus, to prove The- 
orem 1 it remains only to establish correctness of the test. 

4.2 Sketch of completeness 

The main tool behind our completeness argument is Theorem 3. Suppose / is indeed an 
s-sparse polynomial. Then Theorem 3 guarantees that a randomly chosen a will w.h.p. 
yield a "gap" such that subsets with a high-influence variable have variation above the 
gap, and subsets with no high-influence variable have variation below the gap. This 
means that the estimates of each subset's variation (obtained by the algorithm in step 
4) are accurate enough to effectively separate the high-variation subsets from the low- 
variation ones in step 5. Thus, the function /' defined by the algorithm will w.h.p be 
equal to the function p' from Theorem 3. 

Assuming that / is an s-sparse polynomial (and that /' is equal to p'). Theorem 3 
additionally implies that the function /' will be close to the original function (so Step 
6 will pass), that /' only depends on poly(s, 1/e) many variables, and that all of the 
subsets Ij that "survive" into /' are well-structured. As we show in Appendix B, this 
condition is sufficient to ensure that SimMQ can successfully simulate membership 
queries to /". Thus, for / an s-sparse polynomial, the LearnPoly' algorithm can run 
successfully, and the test will accept. 
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4.3 Sketch of soundness 

Here, we briefly argue that if Test-Sparse-Poly accepts / with high probability, then 
/ must be close to some s-sparse polynomial (we give the full proof in Appendix C). 
Note that if / passes Step 4, then Test-Sparse-Poly must have obtained a partition of 
variables into "high-variation" subsets and "low-variation" subsets. If / passes Step 6, 
then it must moreover be the case that / is close to the function /' obtained by zeroing 
out the low-variation subsets. 

In the last step, Test-Sparse-Poly attempts to run the LearnPoly' algorithm using 
/' and the high-variation subsets; in the course of doing this, it makes calls to SimMQ. 
Since / could be an arbitrary function, we do not know whether each high-variation 
subset has at most one variable relevant to /' (as would be the case, by Theorem 3, 
if / were an s-sparse polynomial). However, we are able to show (Lemma 11) that, 
if with high probability all calls to the SimMQ routine are answered without its ever 
returning "fail," then /' must be close to a junta g whose relevant variables are the in- 
dividual "highest-influence" variables in each of the high-variation subsets. Now, given 
that LearnPoly' halts successfully, it must be the case that it constructs a final hypoth- 
esis h that is itself an s-sparse polynomial and that agrees with many calls to SimMQ 
on random examples. Lemma 12 states that, in this event, h must be close to g, hence 
close to /', and hence close to /. 



5 Conclusion and future directions 

An obvious question raised by our work is whether similar methods can be used to ef- 
ficiently test s-sparse polynomials over a general finite field F, with query and time 
complexity polynomial in s, 1/e, and |F|. The basic algorithm of [DLM+07] uses 
0((s |F|)''/e^) queries to test s-sparse polynomials overF, but has running time 2'^(''l'^l^- 
(l/e)'°^'°s'^/^'' (arising, as discussed in Section 1, from brute-force search for a con- 
sistent hypothesis.). One might hope to improve that algorithm by using techniques 
from the current paper However, doing so requires an algorithm for properly learn- 
ing s-sparse polynomials over general finite fields. To the best of our knowledge, the 
most efficient algorithm for doing this (given only black-box access to / : F"^F) 
is the algorithm of Bshouty [Bsh97b] which requires m = s'^'-l'^l '"^'^ log n queries 
and runs in poly(m, n) time. (Other learning algorithms are known which do not have 
this exponential dependence on |F|, but they either require evaluating the polynomial 
at complex roots of unity [Man95] or on inputs belonging to an extension field of F 
[GKS90,Kar89].) It would be interesting to know whether there is a testing algorithm 
that simultaneously achieves a polynomial runtime (and hence query complexity) de- 
pendence on both the size parameter s and the cardinality of the field |F| . 

Another goal for future work is to apply our methods to other classes beyond just 
polynomials. Is it possible to combine the "testing by implicit learning" approach of 
[DLM+07] with other membership-query-based learning algorithms, to achieve time 
and query efficient testers for other natural classes? 
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A Proof of Theorem 3 

In Section A. 1 we prove some useful preliminary lemmas about the variation of indi- 
vidual variables in sparse polynomials. In Section A. 2 we extend this analysis to get 
high-probability statements about variation of subsets {IjYj^i in a random partition. 
We put the pieces together to finish the proof of Theorem 3 in Section A. 3. 

Throughout this section the parameters t. A, r and a are all as defined in Theo- 
rem 3. 

A.l The influence of variables in s-sparse polynomials 

We start with a simple lemma stating that only a small number of variables can have 
large variation; 

Lemma 3. Let p : {0, 1}"^{ — 1, 1} be an s-sparse polynomial. For any S > 0, there 
are at most s \og{2s/5) many variables Xi that have Vrp(i) > 6. 

Proof. Any variable Xi with Vrp(i) > 5 must occur in some term of length at most 
log(2s/(5). (Otherwise each occurrence of Xi would contribute less than 5/ s to the 
variation of the i-th coordinate, and since there are at most s terms this would im- 
ply Vrp(i) < s ■ (S/s) = 5.) Since at most s\og{2s/S) distinct variables can occur in 
terms of length at most log(2s/(5), the lemma follows. ■ 

Lemma 4. With probability at least 96/100 over the choice of a, no variable Xi has 

Vrp(z) e [a-4:A,a + 4:A]. 

Proof. The uniform random variable a has support A{t, A) of size no less than 50s \og{8s^ /t). 
Each possible value of a defines the interval of variations [a — A A, a + 4Z\]. Note 
that a — A A > r/(4s^). In other words, the only variables which could lie in [a — 
AA, a + AA] are those with variation at least t/ (4s^). By Lemma 3 there are at most 

k == s\og{8s^ /t) such candidate variables. Since we have at least 50fc intervals (two 
consecutive such intervals overlap at a single point) and at most k candidate variables, 
by the pigeonhole principle, at least 48fc intervals will be empty. ■ 

Lemma 3 is based on the observation that, in a sparse polynomial, a variable with "high" 
influence (variation) must occur in some "short" term. The following lemma is in some 
sense a quantitative converse: it states that a variable with "small" influence can only 
appear in "long" terms. 

Lemma 5. Let p : {0, 1}"— >{ — 1, 1} be an s-sparse polynomial. Suppose that i is such 
that Vrp(i) < t/(s^ + s). Then the variable Xi appears only in terms of length greater 
than log(s/r). 

Proof. By contradiction. Assuming that Xi appears in some term of length at most 
log(s/r), we will show that Vrp(z) > r/ (s^ + s). Let T be a shortest term that Xi ap- 
pears in. The function p can be uniquely decomposed as follows: X2t ■ ■ .Xn) = 
Xi ■ {T' + pi) + p2, where T — Xi ■ T', the term T' has length less than log(s/r) and 



14 



does not depend on Xi, and pi, p2 are s-sparse polynomials that do not depend on Xi. 
Observe that since T is a shortest term that contains Xi, the polynomial pi does not 
contain the constant term 1. 

Since T' contains fewer than log(s/T) many variables, it evaluates to 1 on at least 
a t/s fraction of all inputs. The partial assignment that sets all the variables in T' 
to 1 induces an s-sparse polynomial p[ (the restriction of pi according to the partial 
assignment). Now observe that p[ still does not contain the constant term 1 (for since 
each term in pi is of length at least the length of T', no term in pi is a subset of the 
variables in T'). We now recall the following (nontrivial) result of Karpinski and Luby 
[KL93]: 

Claim ([KL93], Corollary 1). Let g be an s-sparse multivariate GF{2) polynomial 
which does not contain the constant-1 term. Then g{x) = for at least a l/(s + 1) 
fraction of all inputs. 

Applying this corollary to the polynomial p'l, we have that p'^ is on at least a 
l/(s + 1) fraction of its inputs. Therefore, the polynomial T' + pi is 1 on at least a 
{t/s) ■ l/(s + 1) fraction of all inputs in {0, 1}"; this in turn implies that Vrp(z) > 
(r/s) • l/(s + l) =t/(s2 + s). ■ 

By a simple application of Lemma 5 we can show that setting low-variation vari- 
ables to zero does not change the polynomial by much: 

Lemma 6. Let p : {0, 1}"— *■{ — 1, 1} be an s-sparse polynomial. Let g be a function 
obtained from p by setting to some subset of variables all of which have \vp(i) < 
t/(2s^). Then g and p are T-close. 

Proof. Setting a variable to removes all the terms that contain it from p. By Lemma 5, 
doing this only removes terms of length greater than log(s /r) . Removing one such term 
changes the function on at most a t/s fraction of the inputs. Since there are at most s 
terms in total, the lemma follows by a union bound. ■ 

A.2 Partitioning variables into random subsets 

The following lemma is at the heart of Theorem 3. The lemma states that when we ran- 
domly partition the variables (coordinates) into subsets, (/) each subset gets at most one 
"high-influence" variable (the term "high-influence" here means relative to an appro- 
priate threshold value t ^ a), and (/;) the remaining (low-influence) variables (w.r.t. t) 
have a "very small" contribution to the subset's total variation. 

The first part of the lemma follows easily from a birthday-paradox type argument, 
since there are many more subsets than high-influence variables. As intuition for the 
second part, we note that in expectation, the total variation of each subset is very small. 
A more careful argument lets us argue that the total contribution of the low-influence 
variables in a given subset is unlikely to highly exceed its expectation. 

def 

Lemma 7. Fix a value of a satisfying the first statement of Theorem 3. Let t = At /{AG's), 
where C is a suitably large constant. Then with probability 99/100 over the random 
partition the following statements hold true: 
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- For every j G [r], Ij contains at most one variable Xi with Vrp(z) > t. 

- Let if = {i e Ij I Vrp(i) < <}. Thenjorallj £ [r], YTp{lf) < A. 

Proof. We show that each statement of the lemma fails independently with probability 
at most 1/200 from which the lemma follows. 

By Lemma 3 there are at most b = s\og{2s/t) coordinates in [n] with variation 
more than t. A standard argument yields that the probability there exists a subset Ij 
with more than one such variable is at most b^/r. It is easy to verify that this is less 
than 1/200, as long as C is large enough relative to C". Therefore, with probability at 
least 199/200, every subset contains at most one variable with variation greater than t. 
So the first statement fails with probability no more than 1 /200. 

Now for the second statement. Consider a fixed subset Ij . We analyze the contri- 
bution of variables in if to the total variation Vrp(/j ). We will show that with high 
probability the contribution of these variables is at most A. 

Let S = {i ^ [n] \ Vrp(i) < t} and renumber the coordinates such that S = 
[k']. Each variable .t,;, i G 5, is contained in Ij independently with probability l/r. 
Let Xi , . . . , Xk' be the corresponding independent Bernoulli random variables. Recall 
that, by sub-additivity, the variation of if is upper bounded hy X = X]i=i Vrp(i) • 
Xi. It thus suffices to upper bound the probabiUty Pr[X > A]. Note that E[X] = 

E-li Vrp(*)-E[X,] = (l/r)-E-li Vrp(*) < (,s/r), since ^ti Vrp(z) < ELi Vrp(z) < 
s. The last inequality follows from the following simple fact (the proof of which is left 
for the reader). 

Fact4 Let p : {0, — 1, 1} be an s-sparse polynomial. Then X]"=i Vrp(i) < s. 

To finish the proof, we need the following version of the Chernoff bound: 

Fact 5 ([MR95]) For k' e M*, let ai, . . . , aw e [0, 1] and let Xi,..., Xk' be inde- 
pendent Bernoulli trials. Let X' — X^iLi (^i^i ^'^d /i E[X'] > 0. Then for any 
7 > 1 we have Pt[X' > 7 • /i] < (sl^)^. 

We apply the above bound for the X^'s with ai = Vrp(i) /t G [0, 1]. (Recall that the 
coordinates in S have variation at most t.) We have /i = E[X'] ~ E[X]/t < s/{rt) = 
C's/Ct, and we are interested in the event {X > A} = {X' > A/t}. Note that 
A/t = AC's/t. Hence, 7 > 4C and the above bound implies that Pr[X > Z\] < 

(e/(4C))^^''/" < (1/4^4)^'^/-. 

Therefore, for a fixed subset Ij, we have Pr[Vrp(/j^*) > A] < (l/AC^)^'"/^ . By 
a union bound, we conclude that this happens in every subset with failure probability at 
most r ■ (1/4C"*)'-^ This is less than 1/200 as long as C is a large enough absolute 
constant (independent of C), which completes the proof. ■ 

Next we show that by "zeroing out" the variables in low-variation subsets, we are 
likely to "kill" all terms in p that contain a low-influence variable. 

Lemma 8. With probability at least 99/100 over the random partition, every monomial 
of p containing a variable with influence at most a has at least one of its variables in 



16 



Proof. By Lemma 3 there are at most = s log(8s^/T) variables with influence more 
than a. Thus, no matter the partition, at most h subsets from {IjYj^i contain such 
variables. Fix a low-influence variable (influence at most a) from every monomial con- 
taining such a variable. For each fixed variable, the probability that it ends up in the 
same subset as a high-influence variable is at most h/r. Union bounding over each of 
the (at most s) monomials, the failure probability of the lemma is upper bounded by 
s6/r< 1/100. ■ 



A.3 Proof of Theorem 3 

Proof. (Theorem 3) We prove each statement in turn. The first statement of the theorem 
is implied by Lemma 4. (Note that, as expected, the validity of this statement does not 
depend on the random partition.) 

We claim that statements 2-5 essentially follow from Lemma 7. (In contrast, the 
validity of these statements crucially depends on the random partition.) 

Let us first prove the third statement. We want to show that (w.h.p. over the choice 
of a and {IjYj=i) for every j G H{a), (i) there exists a unique ij G Ij such that 
Vrp(ij) > a and (/;) that Vi'p{Ij \ {ij}) < Fix some j £ H{a). By Lemma 7, 
for a given value of a satisfying the first statement of the theorem, we have: (/') Ij 
contains at most one variable Xi . with Vrp(ij) > t and (// ') Vrp{Ij \ {ij}) < /i. Since 
t < t/As^ < a (with probability 1), (/') clearly implies that, if Ij has a high-variation 
element (w.r.t. a), then it is unique. In fact, we claim that Yxp{ij) > a. For otherwise, 
by sub-additivity of variation, we would have Vi-p{Ij) < Vrp{Ij \ {ij}) + Vrp{ij) < 
A + a — 4Z\ = a — 3 A < a, which contradicts the assumption that j € H{a). Note 
that we have used the fact that a satisfies the first statement of the theorem, that is 
Vrp(ij) < a VTp{ij) < a — A A. Hence, for a "good" value of a (one satisfying the 
first statement of the theorem), the third statement is satisfied with probability at least 
99/100 over the random partition. By Lemma 4, a "good" value of a is chosen with 
probability 96/100. By independence, the conclusions of Lemma 4 and Lemma 7 hold 
simultaneously with probability more than 9/10. 

We now establish the second statement. We assume as before that a is a "good" 
value. Consider a fixed subset Ij, j e [r\. If j e II{o) (i-e. Ij is a high-variation 
subset) then, with probability at least 99/100 (over the random partition), there exists 
ij G Ij such that Yxpiij) > a + A A. The monotonicity of variation yields Vip{Ij) > 
Vrp(ij) > a + AA. If j G L{a) then Ij contains no high-variation variable, i.e. its 
maximum variation element has variation at most a — AA and by the second part of 
Lemma 7 the remaining variables contribute at most A to its total variation. Hence, by 
sub-additivity we have that VTp{Ij) < a — 3 A. Since a "good" value of a is chosen 
with probability 96/100, the desired statement follows. 

The fourth statement follows from the aforementioned and the fact that there exist at 
most s log(8s''/r) variables with variation at least a (as follows from Lemma 3, given 
that a > r/(4.s^)). 

Now for the fifth statement. Lemma 8 and monotonicity imply that the only vari- 
ables that remain relevant in p' are (some of) those with high influence (at least a) in p, 
and, as argued above, each high-variation subset Ij contains at most one such variable. 
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By a union bound, the conclusion of Lemma 8 holds simultaneously with the conclu- 
sions of Lemma 4 and Lemma 7 with probability at least 9/10. 

The sixth statement (that p and p' are r-close) is a consequence of Lemma 6 (since 
p' is obtained from p by setting to variables with variation less than a < t/ (2s^)). 
This concludes the proof of Theorem 3. ■ 

B Completeness of the test 

In this section we show that Test-Sparse-Poly is complete: 

Theorem 6. Suppose f is an s-sparse GF{2) polynomial. Then Test-Sparse-Poly ac- 
cepts f with probability at least 2/3. 

Proof. Fix / to be an s-sparse GF{2) polynomial over {0, 1}". By the choice of the 
A and r parameters in Step 1 of Test-Sparse-Poly we may apply Theorem 3, so with 
failure probability at most 1/10 over the choice of a and /i, . . . , in Steps 2 and 3, 
statements 1-6 of Theorem 3 all hold. We shall write /' to denote /|o^Ujgi(Q)/j ■ Note 
that at each successive stage of the proof we shall assume that the "failure probability" 
events do not occur, i.e. henceforth we shall assume that statements 1-6 all hold for /; 
we take a union bound over all failure probabilities at the end of the proof. 

Now consider the M executions of the independence test for a given fixed Ij in 
Step 4. Lemma 2 gives that each run rejects with probability iVry(/j). A standard 
Hoeffding bound implies that for the algorithm's choice of M = ln(200r), the 
value Yif{Ij) obtained in Step 4 is within ±Z\ of the true value Vry (/j) with fail- 
ure probability at most A union bound over all j G [r] gives that with failure 
probability at most 1/100, we have that each Vry(/j) is within an additive ±Z\ of 
the true value Vr/(/j). This means that (by statement 2 of Theorem 3) every Ij has 
Vr/(/j ) ^ [a - 2 A, a + 3Z\], and hence in Step 5 of the test, the sets L{a) and H{a) 
are identical to L{a) and H{a) respectively, which in turn means that the function /' 
defined in Step 5 is identical to /' defined above. 

We now turn to Step 6 of the test. By statement 6 of Theorem 3 we have that / 
and /' disagree on at most a r fraction of inputs. A union bound over the m random 
examples drawn in Step 6 implies that with failure probability at most rm < 1/100 the 
test proceeds to Step 7. 

By statement 3 of Theorem 3 we have that each Ij, j G II{a) = II{a), contains 
precisely one high- variation element ij (i.e. which satisfies Vr/(ij) > a), and these 
are all of the high-variation elements. Consider the set of these |i?(a)| high-variation 
variables; statement 5 of Theorem 3 implies that these are the only variables which /' 
can depend on (it is possible that it does not depend on some of these variables). Let 
us write /" to denote the function /" : {0, Ijl^^")'— >{ — 1, 1} corresponding to /' but 
whose input variables are these \H{a) \ high-variation variables in /, one per Ij for each 
j G H{a). We thus have that /" is isomorphic to /' (obtained from /' by discarding 
irrelevant variables). 

The main idea behind the completeness proof is that in Step 7 of Test-Sparse-Poly, 
the learning algorithm LearnPoly' is being run with target function /". Since /" is 
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isomorphic to /', which is an s-sparse polynomial (since it is a restriction of an s- 
sparse polynomial /), with high probabihty LearnPoly' will run successfully and the 
test will accept. To show that this is what actually happens, we must show that with 
high probability each call to SimMQ which LearnPoly' makes correctly simulates the 
corresponding membership query to /". This is established by the following lemmas: 

Lemma 9. Letf,I,a,AbesuchthatIis{a,A)-well-structuredwithA < aS/{2ln{2/S)). 
Then with probability at least 1 — 5, the output o/SHIV(/, /, a, A, b, 6) is an assign- 
ment w € {0, 1}^ which has Wj = b. 

Proof. We assume that contains the high-variation variable i (the other case being 
very similar). Recall that by Lemma 2, each run of the independence test on rejects 
with probability iVr/ (/''); by Lemma 1 (monotonicity) this is at least iVr/(i) > a/2. 
So the probability that is not marked even once after c iterations of the independence 
test is at most (1 — ct/2y < S/2, by our choice of c. Similarly, the probability that 
is ever marked during c iterations of the independence test is at most c{A/2) < (5/2, by 
the condition of the lemma. Thus, the probability of failing at step 3 of SHIV is at most 
6, and since i G /'', the assignment w sets variable i correctly in step 4. ■ 

Lemma 10. With total failure probability at most 1 / 100, each of the Q{s, \H{a) | , e/4, 1/ 100) 
calls to SimMQ(/, H{a), {Ij}^^Hio.r l/(100Q(s, |iJ(a)|, e/4, 1/100))) 

that LearnPoly' makes in Step 7 of Test-Sparse-Poly returns the correct value of 
f"{z). 

Proof. Consider a single call to the procedure SimMQ(/, H{a), {Ij}j^H{a)^ 

z, l/(100Q(s, Iff (a) I, e/4, 1/100))) made by LearnPoly'. We show that with failure 

def 

probability at most 5' = l/(100Q(.s, |i/(Q!)|, e/4, 1/100) this call returns the value 
f"{z), and the lemma then follows by a union bound over the Q{s, \H{a)\, e/4, 1/100) 
many calls to SimMQ. 

This call to SimMQ makes \H{a)\ calls to SHIV(/, 7^, a, Z\, Zj,5' / H{a)\), one 
for each j G H{a). Consider any fixed j G H{a). Statement 3 of Theorem 3 gives that 
Ij (j G H{a)) is {a, Z\)-well-structured. Since a > it is easy to check the condi- 
tion of Lemma 9 holds where the role of "J" in that inequality is played by 8' l\Ii{a)\, 
so we may apply Lemma 9 and conclude that with failure probability at most (5'/|iJ(a)| 
(recall that by statement 4 of Theorem 3 we have (q:)| < s log(8s''/r)), SHIV re- 
turns an assignment to the variables in Ij which sets the high-variation variable to Zj 
as required. By a union bound, the overall failure probability that any /j (j G H{a)) 
has its high- variation variable not set according to z is at most 5'. Now statement 5 
and the discussion preceding this lemma (the isomorphism between /' and /") give 
that SimMQ sets all of the variables that are relevant in /' correctly according to z in 
the assignment x it constructs in Step 2. Since this assignment x sets all variables in 
U^gj-Ij to 0, the bit b = f{x) that is returned is the correct value of f"{z), with failure 
probability at most 5' as required. ■ 

With Lemma 10 in hand, we have that with failure probability at most 1/100, the 
execution of LearnPoly'(s, |iJ(a)|,e/4, 1/100) in Step 7 of Test-Sparse-Poly cor- 
rectly simulates all membership queries. As a consequence. Corollary 1 thus gives 
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that LearnPoly'(s, \H{a)\, e/4, 1/100)) returns "not s-sparse" with probability at most 
1 / 100. Summing all the failure probabilities over the entire execution of the algorithm, 
the overall probability that Test-Sparse-Poly does not output "yes" is at most 

Theorem 3 Slcp 4 Slop 6 U-mma 10 Corollary 1 

1/10 + 1/100 +l7]^+l7]^+l7lOO< 1/5, 
and the completeness theorem is proved. (Theorem 6) ■ 



C Soundness of the Test 

In this section we prove the soundness of Test-Sparse-Poly: 

Theorem 1. If f is e-farfrom any s-sparse polynomial, then Test-Sparse-Poly accepts 
with probability at most 1 /3. 

Proof. To prove the soundness of the test, we start by assuming that the function / has 
progressed to step 5, so there are subsets /i, . . . ,7^ and H{a) satisfying Vr/(/j) > 
a + 2A for all j G H{a). As in the proof of completeness, we have that the actual 
variations of all subsets should be close to the estimates, i.e. that Vr/(/j ) > a + A for 
all j G H{a) except with with probability at most 1/100. We may then complete the 
proof in two parts by establishing the following: 

- If / and /' are ea-far, step 6 will accept with probability at most 6a- 

- If /' is e;,-far from every s-sparse polynomial, step 7 will accept with probability at 
most 6b- 

Establishing these statements with ea = £6 = e/2, 6a = 1/12 and 6b = 1/6 will 
allow us to complete the proof (and we may assume throughout the rest of the proof 
that Vr/(Jj) > a for each j E H{a))- 

The first statement follows immediately by our choice of m = ^ In j- with €a = 
e/2 and 6a = 1/12 in Step 6. Our main task is to establish the second statement, which 
we do using Lemmas 11 and 12 stated below. Intuitively, we would like to show that 
if LearnPoly' outputs a hypothesis h (which must be an s-sparse polynomial since 
LearnPoly' is proper) with probability greater than 1/6, then /' is close to a junta iso- 
morphic to h- To do this, we establish that if LearnPoly' succeeds with high probability, 
then the last hypothesis on which an equivalence query is performed in LearnPoly' is 
a function which is close to /'. Our proof uses two lemmas: Lemma 12 tells us that 
this holds if the high variation subsets satisfy a certain structure, and Lemma 1 1 tells us 
that if LearnPoly' succeeds with high probability then the subsets indeed satisfy this 
structure. We now state these lemmas formally and complete the proof of the theorem, 
deferring the proofs of the lemmas until later. 

Recall that the algorithm LearnPoly' will make repeated calls to SimMQ which in 
turn makes repeated calls to SHIV. Lemma 1 1 states that if, with probability greater 
than 62, all of these calls to SHIV return without failure, then the subsets associated 
with H{a) have a special structure. 
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Lemma 11. Let J C [n] be a subset of variables obtained by including the highest- 
variation element in Ij for each j G H{a) (breaking ties arbitrarily). Suppose that 
k > 300|i?(Q;)|/e2 queries are made to SimMQ. Suppose moreover that Pr[ every call 
to SHIV that is made during these k queries returns without outputting fail'] is greater 
than 82 for 62 = l/i7(fc). Then the following both hold: 

- Every subset Ij for j £ H{ol) satisfies \v f {I j \ J) < 2^2/ \H{a)\; and 

— The function f is e2-close to the junta g : {0, l}'^'-"'"— >{ — 1, 1} defined as as: 

g{x)'^'sign{E,[f{{xnJ)Uz)]). 

Given that the subsets associated with H{a) have this special structure, Lemma 12 tells 
us that the hypothesis output by LearnPoly' should be close to the junta g. 

Lemma 12. Define Qe as the maximum number of calls to SimMQ that that will be 
made by LearnPoly' in all of its equivalence queries. Suppose that for every j e Hia), 
it holds that \vf{Ij \ J) < 2e2/\H[a)\ with £2 < go(^^ ■ Then the probability that 
LearnPoly' outputs a hypothesis h which is e/A-far from the junta g is at most ($3 = 
1/100. 

We now show that Lemmas 11 and 12 suffice to prove the desired result. Suppose 
that LearnPoly' accepts with probability at least 5b = 1/6. Assume LearnPoly' makes 
at least k queries to SimMQ (we address this in the next paragraph); then it follows 
from Lemma 1 1 that the bins associated with H{a) satisfy the conditions of Lemma 
12 and that /' is e2-close to the junta g. Now applying Lemma 12, we have that with 
failure probability at most 1/100, LearnPoly' outputs a hypothesis which is e/4-close 
to g. But then /' must be (e2 + e/4)-close to this hypothesis, which is an s-sparse 
polynomial. 

We need to establish that LearnPoly' indeed makes k > 300\H{a)\/e2 SimMQ 
queries for an 62 that satisfies the condition on £2 in Lemma 12. (Note that if LearnPoly' 
does not actually make this many queries, we can simply have it make artificial calls to 
SHIV to achieve this. An easy extension of our completeness proof handles this slight 
extension of the algorithm; we omit the details.) Since we need £2 < u/SOOQe and 
Theorem 2 gives us thatQE = {\H{a)\s + 2)- ^\n300{\H{a)\s + 2) (each equivalence 
query is simulated using | In 300{\H{a)\s+2) random examples), an easy computation 
shows that it suffices to take k = poly(s, l/£), and the proof of Theorem 7 is complete. 

■ 

Before proving Lemma 12 and Lemma 11, we prove the following about the be- 
havior of SHIV when it is called with parameters a, A that do not quite match the real 
values a', A' for which / is {a' , Z\')-well-structured; 

Lemma 13. If I is (a' , A')-well-structured, then the probability that SHIV{f, I, a, A, b, 5) 
passes (i.e. does not output "fail") and sets the high variation variable incorrectly is at 
most (5/2)"'/" • (1/a) • A' ■ ln(2/(5). 
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Proof. The only way for SHIV to pass with an incorrect setting of the high-variation 
variable i is if it fails to mark the subset containing i for c iterations of the independence 
test, and marks the other subset at least once. Since Vr{i) > a' and Vr{I \ i) < A', 
the probability of this occurring is at most (1 — a' /2y ■ A' ■ c/2. Since SHIV is called 
with failure parameter 6, c is set to ^ In |. ■ 

We now give a proof of Lemma 12, followed by a proof of Lemma 1 1 . 

Proof. (Lemma 12) By assumption each Vry (/j \J)< 2e2/\H{a)\ and Vvf^j) > a, 
so subadditivity of variation gives us that for each j G H{a), there exists an i G Ij such 
that Vr/(i) > a-2e2l\H{a)\. Thus for every each call to SHIV made by SimMQ, the 
conditions of Lemma 13 are satisfied with Vr/(i) > a — 2t2 / \H {a)\ and Vr/ ( Jj \ J) < 
2e2 /\H[a)\. We show that as long as 62 < golf^' probability that any particular 
query z to SimMQ has a variable set incorrectly is at most S^/3Qe- 

Suppose SHIV has been called with failure probabiUty ^4, then the probabiUty given 
by Lemma 1 3 is at most: 



(5^/2)i-2../(-|i/(«)l) . 1 (1.) . 2e2/\H{a)l (1) 

We shall show that this is at most (53/3|i/(Q)|Q£; = l/300Q£;|-ff(a)|. Taking 62 < 
a/800QE simplifies (1) to: 

^ .(54/2)i-2^=/(-I^MI)4ln^, 



300QE\H{a)\ 4 ^4 

which is at most 1/300 |iJ(Q;)|(5£; as long as 

(2/<54)i-2^^/("-l^(")l) > -ln|-, 
4 04 

which certainly holds for our choice of £2 and the setting of 64 = l/100k\H{a)\. Each 
call to SimMQ uses \H{a)\ calls to SHIV, so a union bound gives that each random 
query to SimMQ returns an incorrect assignment with probability at most I/SOOQb. 

Now, since /' and g are e2-close and 62 satisfies e2QE < "^s/S, in the uniform 
random samples used to simulate the final (accepting) equivalence query, LearnPoly' 
will receive examples labeled correctly according to g with probability at least 1 — 
263/3. Finally, note that LearnPoly' makes at most \H{a)\s+2 equivalence queries and 
hence each query is simulated using ^ In ■^^^-^^'^'^\^~^^'> random examples (for a failure 
probability of for each equivalence query). Then LearnPoly' will reject with 

probability at least 1 — (53/3 unless g and h are e/4-close. This concludes the proof of 
Lemma 12. ■ 

Proof. (Lemma 11) We prove that if Vr/(/j \ J) > 2e2l\H{a)\ for some j G H{a), 
then the probability that all calls to SHIV return successfully is at most 52- The close- 
ness of /' and g follows easily by the subadditivity of variation and Proposition 3.2 of 
[FKR+04]. 
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First, we prove a much weaker statement whose analysis and conclusion will be 
used to prove the proposition. We show in Proposition 1 that if the test accepts with 
high probability, then the variation from each variable in any subset is small. We use 
the bound on each variable's variation to obtain the concentration result in Proposition 
2, and then complete the proof of Lemma 1 1 . 

Proposition 1. Suppose that k calls to SHIV are made with a particular subset I, and 
let i be the variable with the highest variation in I. IfYTf{j) > e2/l00\H{a)\ for some 
j 6 / \ i, then the probability that SHIV returns without outputting 'fail' for all k calls 
is at most S* = e^'^/^^ + e""^. 

Proof Suppose that there exist j, j' e / with Vr/(j) > Vr/(j') > e2/100|i/(a)|. A 
standard Chernoff bound gives that except with probability at most e~'^/^*, for at least 
(l/3)fc of the calls to SHIV, variables j and j' are in different partitions. In these cases, 
the probability SHIV does not output 'fail' is at most 2(1 — e2/100|i7(a)|)^, since for 
each of the c runs of the independence test, one of the partitions must not be marked. 
The probability no call outputs 'fail' is at most e"''/^® + 2(1 - e2/100|i/(Q:)|)'=''/^. Our 
choice of fc > 300\H{a)\/e2 ensures that (i/e)'=fe^2/300|ff(a)| < (l/e)". ■ 

Since in our setting \Ij \ may depend on n, using the monotonicity of variation with 
the previous claim does not give a useful bound on Yif{I\i). But we see from the proof 
that if the variation of each partition is not much less than Vr/(/ \ i) and Vr/(/ \ i) > 
2e2 /\H{a)\, then with enough calls to SHIV one of these calls should output "fail." 
Hence the lemma will be easily proven once we establish the following proposition: 

Proposition 2. Suppose that k calls to SHIV are made with a particular subset I hav- 
ingYTf{I\i) > 2e2/\H{a)\ andYTf{j) < e2/W0\H{a)\ for every j G Thenwith 
probability greater than 1 — S** = 1 — e"'^/^^, at least 1/3 of the k calls to SHIV yield 
both Vr/(/i) > ?7Vi7(/ \ i)/2 andYvf{I^) > ?7Vr/(/ \ i)/2, where r] = l/e- 1/50. 

Proof. We would like to show that a random partition of / into two parts will result in 
parts each of which has variation not much less than the variation of / \ i. Choosing a 
partition is equivalent to choosing a random subset I' of I \ i and including i in /' or 
I \ I' with equal probability. Thus it suffices to show that for random /' C / \ i, it is 
unlikely that Vr/(/') is much smaller than Yrf{I \ i). 

This does not hold for general /, but by bounding the variation of any particular 
variable in /, which we have done in Proposition 1, and computing the unique-variation 
(a technical tool introduced in [FKR+04]) of /', we may obtain a deviation bound on 
Vr/(/'). The following statement follows from Lemma 3.4 of [FKR+04]: 

Proposition 3 ([FKR+04]). Define the unique-variation of variable j (with respect to 
i) as 

VrfU) =Yrf{[j]\i) ~Yrf{[j - 
Then for any /' C / \ i, 

Vr/(/') > ^ Vi-fU) = Vr^-([j] \ z) - Yrf{[j - 1] \ 
jer jei' 
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Now Vr/ (/' ) is lower bounded by a sum of independent, non-negative random variables 
whose expectation is given by 

n 

jei' i=i 

To obtain a concentration property, we require a bound on each Ur/(7) < Vif{j), 
which is precisely what we showed in the previous proposition. Note that Ur/(i) = 0, 
and recall that we have assumed that fi > e2/\H{a)\ and every j G I \ i satisfies 
Vr^(j) < ///lOO. 

Now we may use the bound from [FKR+04] in Proposition 3.5 with — 1/e — 
2/100 to obtain: 

Pr[E Ur/(j) < ry/i] < cxp( — (r,e - 1)))] < l/e^. 

Thus the probability that one of and has variation less than i]fi is at most 1/2. 
We expect that half of the k calls to SHIV will result in /° and having variation at 
least ?7/i, so a Chernoff bound completes the proof of the claim with S** < e"'^/^^. This 
concludes the proof of Proposition 2. ■ 

Finally, we proceed to prove the lemma. Suppose that there exists some / such that 
Vry(/\i) > 2e2/\H{a)\. Now the probability that a particular call to SHFV with subset 
I succeeds is: 

Pr[marked(/°); -i markcd(/^)] + Pr[markcd(/"'^); -i markcd(/'')]. 

By Propositions 1 and 2, if with probability at least 6* + 6** none of the k calls to 
SHIV return fail, then for fc/3 runs of SHIV both Vr/(/i) and Vr/(/0) are at least 
r]e2/\H{a)\ > e2/4:\H{a)\ and thus both probabilities are at most (1 — e2/4:\H{a)\y. 

As in the analysis of the first proposition, we may conclude that every subset / 
which is called with SHIV at least k times either satisfies Vr/(/ \ i) < 2e2/\H{a)\ or 
will cause the test to reject with probability at least 1 — S** — 2S*. RecalHhat S* = 
e~° + e~'^/^^; since SHIV is set to run with failure probability at most l/\H{a)\k, we 
have that is 1/ f2{k). This concludes the proof of Lemma 11. ■ 



